Word Image Matching in a Methodology for Degraded Text Recognition

نویسندگان

  • Jonathan J. Hull
  • Siamak Khoubyari
  • Tin Kam Ho
چکیده

A technique for the use of global context in text recognition is presented that determines equivalences between word images in a passage of text. Initial hypotheses for the identities of words are then generated by matching the word groups to language statistics that predict the frequency at which certain words will occur. This is followed by a recognition step and a relaxation-based control structure that resolves inconsistencies between several knowledge sources. This paper concentrates on the equivalence determination algorithm. A word matching technique is presented and its perfonnance .on a running text of about 1000 " word images is determined. Several levels of noise are introduced to simulate different J amounts of degradation introduced by fax machines or photocopiers. It is shown that the word matching algorithm maintains its ability to locate small groups of equivalent word images with high reliability even in the presence of noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Image Matching as a Techique for Degraded Text Recognition

A technique is presented that determines equivalences between word images in a passage of text. A clustering procedure is applied to group visually similar words. Initial hypotheses for the identities of words are then generated by matching the word groups to language statistics that predict the frequency at which certain words will occur. This is followed by a recognition step that assigns ide...

متن کامل

Integration of Visual Inter-Word Constraints and Linguistic Knowledge in Degraded Text Recognition

Degraded text recognition is a di cult task. Given a noisy text image, a word recognizer can be applied to generate several candidates for each word image. Highlevel knowledge sources can then be used to select a decision from the candidate set for each word image. In this paper, we propose that visual inter-word constraints can be used to facilitate candidate selection. Visual inter-word const...

متن کامل

Keyword Location in Noisy Document Images

It may be difficult to locate keywords in noisy document images because of degraded OCR performance. A new technique for word image matching has the potential to select those. word images in a document that . represent potential keywords and to generate improved prototypes for those keywords. No explicit recognition is pe~formed in this process, but better OCR performance will occur on the impr...

متن کامل

Prototype Extraction and Adaptive OCR

ÐTo maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a ...

متن کامل

Vehicle Logo Recognition Using Image Matching and Textural Features

In recent years, automatic recognition of vehicle logos has become one of the important issues in modern cities. This is due to the unlimited increase of cars and transportation systems that make it impossible to be fully managed and monitored by human. In this research, an automatic real-time logo recognition system for moving cars is introduced based on histogram manipulation. In the proposed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992